Processor cores are divided into two categories: fast and power-hungry out-of-order processors, and efficient, but slower in-order processors. To achieve high performance with low-energy budgets, this proposal aims to deliver out-of-order processing by software (SWOOP) on in-order architectures. Problem: A primary cause for slowdown in in-order processors is last-level cache misses (caused by difficult to predict data-dependent loads), resulting in cores stalling. Solution: As loads are non-blocking operations, independent instructions are scheduled to run before the loads return. We execute critical load instructions earlier in the program for a three-fold benefit: increasing memory and instruction level parallelism, and hiding memory latency. Related work: Some instruction scheduling policies attempt to hide memory latency, but scheduling is confined by basic block limits and register pressure. Software pipelining is restricted by dependencies between instructions and decoupled access-execute (DAE) suffers from address re-computation. Unlike EPIC (evolved from VLIW), SWOOP does not require hardware support for predicated execution, speculative loads and their verification, delayed exception handling, memory disambiguation etc.
展开▼